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A RUN-TIME SIMULATION ENVIRONMENT FOR VOICEXML APPLICATIONS THAT 
SIMULATES AND AUTOMATES USER INTERACTION 

BACKGROUND OF THE INVENTION 

Statement of the Technical Field 

[0001] The present invention relates to the field of computer speech recognition, text- 
to-speech technology and telephony, and more particularly to a system and method for 
a run-time simulation environment for voice applications that simulates and automates 
user interaction. 

Description of the Related Art 

[0002] Functionally testing voice applications presents many difficulties. In the case 
of a VoiceXML (VXML) application, a VXML interpreter communicates with a platform 
that supplies the necessary speech technology needed to test the application in real- 
time. These speech technologies, such as an automatic speech recognition (ASR) 
engine, or a text-to-speech (TTS) engine or converter, are generally very CPU intensive 
and expensive to build and install. In addition to the speech technologies, to test a 
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application a tester must also provided the input to the application. This usually 
requires a tester to physically perform the interaction, in the form of actual speech or 
key tone input, which may be cumbersome and difficult to provide. Having a person 
perform the input can be time consuming and costly. 

[0003] Furthermore, when testing a voice application, it can be difficult to mimic the 
true behavior of speech or audio input to the application, as well as any text-to-speech 
or pre-recorded audio output from the application. This is because voice applications 
are used in a run-time environment, and are therefore very "time-oriented." A user is 
generally required to supply an input to the application within a certain amount of time or 
else a "speech timeout" may occur. In addition, it may be useful to ascertain the how 
long it may take for a typical user to navigate through a voice application so as to 
assess the behavior and efficacy of the application. 

[0004] It would be desirable therefore to provide a testing environment that allows the 
simulation of user interaction as well as the simulation of the speech technology 
platform, such that a developer of voice applications will no longer be dependent on 
human testers and speech technology and hardware to test their applications. The 
testing environment would therefore be a "simulation environment" that would 
adequately replace the user and speech technologies. It would further be desirable to 
provide a simulation environment that simulated the actual execution time of a user 
interaction with the voice application, as if real input and output were occurring. A 
system and method is therefore needed to simulate that real-time execution. 
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SUMMARY OF THE INVENTION 

[0005] The present invention addresses the deficiencies of the art in respect to 
testing voice applications and provides a novel and non-obvious method, system and 
apparatus for a run-time simulation environment for voice applications that simulates 
and automates user interaction. 

[0006] Methods consistent with the present invention provide a method for simulating 
a run-time user interaction with a voice application. A user simulation script 
programmed to specify simulated voice interactions with the voice application is loaded. 
The voice application is first processed to derive a nominal output of the voice 
application. The user simulation script is second processed to generate both a 
simulated output for the voice application corresponding to the nominal output and a 
simulated input for the voice application corresponding to a pre-determined user input to 
the voice application. 

[0007] Systems consistent with the present invention include a simulation tool for 
simulating a run-time user interaction with a voice application running on an application 
server. The tool is configured to load a user simulation script programmed to specify 
simulated voice interactions with the voice application. The tool is further configured to: 

(i) process the voice application to derive a nominal output of the voice application; and 

(ii) process the user simulation script to generate a simulated output for the voice 
application corresponding to the nominal output, and to generate a simulated input for 



15052 



3 



BOC9-2003-0096 



the voice application corresponding to a pre-determined user input to the voice 
application. 

[0008] Additional aspects of the invention will be set forth in part in the description 
which follows, and in part will be obvious from the description, or may be learned by 
practice of the invention. The aspects of the invention will be realized and attained by 
means of the elements and combinations particularly pointed out in the appended 
claims. It is to be understood that both the foregoing general description and the 
following detailed description are exemplary and explanatory only and are not restrictive 
of the invention, as claimed. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



[0009] The accompanying drawings, which are incorporated in and constitute part of 
the specification, illustrate embodiments of the invention and together with the 
description, serve to explain the principles of the invention. The embodiments 
illustrated herein are presently preferred, it being understood, however, that the 
invention is not limited to the precise arrangements and instrumentalities shown, 
wherein: 

[0010] Figure 1 is a conceptual drawing of the present invention which provides a 
user interaction simulation environment for a voice application; 

[0011] Figure 2 is a block diagram showing the arrangement of elements in a system 
assembled in accordance with the principles of the present invention for simulating a 
run-time environment with a voice application; and 

[0012] Figure 3 is a flowchart illustrating a process for simulating a run-time user 
interaction with a voice application. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 



[0013] The present invention is a system and method for simulating a run-time user 
interaction with a voice application. Figure 1 is a conceptual drawing of the present 
invention which provides a user interaction simulation environment for a voice 
application. The simulation environment 100 of the present invention includes a 
simulation tool 101 that is coupled to a voice application 105. The simulation tool 101 
uses a simulation script 110 that provides a set of specified inputs and outputs to and 
from the voice application, to simulate a real-time interaction by a user with the voice 
application. The simulation tool 101 and script 110 replace the actual inputs that may 
be provided by a live user, and replace the actual outputs that may be provided by the 
voice application 101 and all the speech technologies that are otherwise coupled to a 
conventional voice application. All interactions between the user and the voice 
application instead occur between the tool 101 and the application 105, where audible 
sound, keypad tones, pauses, hang-ups, and the like are instead represented by 
scripted text-equivalents that simulate both the content and execution time of such 
interactions. 

[0014] As used herein, a "voice application" shall mean any logic permitting user 
interaction through a voice driven user interface, such as a mark-up language 
specification for voice interaction with some form of coupled computing logic. One 
example of a voice application is an application written in Voice Extensible Mark-up 
Language, or "VoiceXML. M However, it is readily understood that VoiceXML 
applications are not the only type of voice applications, and any reference to the term 
"VoiceXML application" herein shall encompass all voice applications. 
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[0015] In conventional voice systems, the voice application itself receives the 
"outputs" it generates to users from various speech technologies coupled to the voice 
application. For example, the voice application can receive an input from the user, and 
can record the input with an audio device, or convert the spoken word input into text 
using an automatic speech recognition engine. The voice application can then playback 
the recorded audio to the user as a prompt, or may convert a text stream to audio using 
the text-to-speech capabilities of a speech technologies platform, either of which may 
be sent as another "output" to the user. 

[0016] Heretofore, to test a voice application, all of the foregoing speech processing 
elements are needed. The present invention replaces a number of those elements, by 
providing a simulation environment that allows a voice application to be executed in 
real-time, and that supplies and simulates the execution time of the inputs and outputs 
that flow to and from the voice application. This provides for a realistic, cost-effective 
testing environment that will greatly increase the ease and efficacy of developing voice 
applications. 

[0017] Figure 2 is a block diagram showing the arrangement of elements in a system 
200 assembled in accordance with the principles of the present invention for simulating 
a run-time environment with a voice application. The system 200 includes a VoiceXML 
Application 201 (which may running on an application server) that can be interpreted by 
an interpreter 202. The system 200 also includes a simulation script 205 that can be 
interpreted by a second interpreter 210. The second interpreter 210 may reside on a 
separate piece of hardware, or may be resident on the same hardware as the voice 
application 201 and interpreter 202. 
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[0018] The simulation environment 200 can process customized mark-up language 
documents which describe the user interaction or the user experience with the 
environment itself. Specifically, the mark-up language documents describe the set of 
operations a user might take as a transcript of what occurs when interacting with the 
voice application. In this regard, what is the desired to be simulated is the behavior 
between the user and the voice application, which is provided by the simulation script 
205 written in the customized mark-up language, which, by way of non-limiting example, 
may be called a "Voice User Interaction Extensible Mark-up Language," or "VuiXML." 
The user behavior, as well as the prompts and outputs supplied from the voice 
application itself, is mimicked and embodied in the user simulation script 205. The user 
simulation script 205 can be a script that describes how the user interacts with the 
system. Common interaction behaviors can include voice response, input in the form of 
digits, pauses between spoken words, hang-up operations, typical inputs that a user 
would make when interacting with a voice response system. This user interaction is 
embodied in the script 205. 

[0019] In addition to the script 205, an interpreter 210 can be included. The 
interpreter 210 processes the simulation script 205 and interacts with the VoiceXML 
interpreter 202. The interaction between the script 205 and VoiceXML application 201 
uses only text-based results, and dispenses with the need for actual human or machine- 
generated audio input or output. There is however, a pre-cognition of what the user is 
going to do, and thus, the script 205 can be pre-developed. But, the script 205 flows in 
real-time and describes what a user is doing sequentially, and supplies the outputs and 
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prompts from a voice application in real-time, so as to better test and develop the 
application. 

[0020] Figure 3 is a flowchart illustrating a process for simulating a run-time user 
interaction with a voice application. First, the voice application browser, such a 
VoiceXML browser, is called in step 301. Next, in step 305, a user simulation script is 
provided and supplied to the simulation environment. Subsequently, the voice 
application is processed in step 310. 

[0021] The voice application normally generates one or more outputs, which, in 
conventional systems, may be prompts, synthesized text to speech, pre-recorded audio, 
and the like. However, in the simulation environment of the present invention, all such 
outputs are text based, and are initially "nominal" outputs: the outputs that the voice 
application would otherwise provide to a user in the non-simulated environment. Within 
the simulation environment, the actual outputs for the voice application are instead 
generated by the user simulation script, which generates a simulated output for the 
voice application corresponding to the "nominal" output. This occurs in step 315. 

[0022] In step 320, the process next determines whether the voice application 
requires a user input. Should the voice application require a user input, the user 
simulation script is processed in step 325 to generate a simulated input for the voice 
application corresponding to a pre-determined user input to the voice application. As 
stated above, all such input is pre-developed and supplied in the user simulation script. 
The process may then choose to continue after assessing whether additional 
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processing of the voice application is necessary in step 330, or may terminate if 
execution of the voice application is complete. 

[0023] The advantage and utility of the present invention is that the user simulation 
script generates simulated outputs and inputs in real-time, and not only provides the 
simulated text of the nominal outputs and user inputs to a voice application, but 
provides them at a time rate which closely mimics a run-time interaction. One example 
of a run-time user interaction is given below in Table 1 , which simulates a user 
interaction with a voice application written for allowing a user to interact with a voice 
portal that provides banking services. 



TABLE 1 



Type of Input/Output 



Event Sequence 
1: "Welcome to Bank, Enter PIN" Nominal Output 

2: "Welcome to Bank, Enter PIN" Simulated Output 

3: "3497" Pre-determined 

User Input 



Script Replaces 
VoiceXML N.A. 

Simulation Audio 



N.A. 



N.A. 



4: "3497" 

5: "Say 'One 1 for Account..." 
6: "Say 'One' for Account. . ." 
7: "One" 



Simulated Input 

Nominal Output 

Simulated Output 

Pre-determined 
User Input 

Simulated Input 



Simulation Telephony 
Sub-system 

VoiceXML N.A. 

Simulation TTS 



N.A. 



N.A. 



8: "One" 

9: "Your Account Balance is..." Nominal Output 
1 0: "Your Account Balance is. . ." Simulated Output 



Simulation ASR 



VoiceXML N.A. 



Simulation TTS 



15052 



10 



BOC9-2003-0096 



[0024] A sequence of nominal and simulated events in the run-time simulation 
environment of the present invention is shown in Table 1. The sequence begins by 
processing the voice application, which in this case is a VoiceXML application, to derive 
a nominal output from the application, which would prompt a user to enter a PIN code. 
The script providing this nominal output is the VoiceXML application itself. This is 
indicated in the "Script" column of Table 1 . To simulate the time it would take an 
application server to execute such a nominal output in real-time, the user simulation 
script is processed to generate a simulated output, which has the same text 
corresponding to the nominal output at event 1 in the sequence of Table 1 . The 
simulated output is run at a rate that simulates the length of time a system would take to 
execute that output. This output is generated by the simulation script, and not the voice 
application script. The simulation script therefore replaces the pre-recorded or 
synthesized audio device that would otherwise be supplied through the VoiceXML script 
to prompt the user. This is indicated in the "Replaces" column of Table 1 . 

[0025] A similar pair of nominal input/outputs and simulated input/outputs are 
provided in events 3 through 1 0 of the sequence of Table 1 . In event 4, the simulation 
script provides a simulated input corresponding to the pre-determined user input in the 
form of digits that replace the input that would have otherwise been supplied through a 
software-telephony sub-system in a conventional testing environment. Next, the 
simulation script simulates a VoiceXML prompt for an account code number, by 
replacing the TTS engine that would otherwise be used to supply the prompt. Another 
predetermined user input is provided in event 8 as a simulation of the ASR engine, 
which would otherwise recognize a user's nominal input of the spoken word "One" at 
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event 7. The final simulation would be of another text-to-speech-synthesized nominal 
output that is simulated by an additional processing of the user simulation script to 
provide a text equivalent in real-time of the message "Your account balance is...." 

[0026] As shown in Table 1 , the entire user interaction with the VoiceXML application 
is simulated by the simulation script running in real-time with the voice application. As 
shown in FIG. 2, a simulation script interpreter 210 interpreting the customized mark-up 
language of the script 205 is coupled to a voice application interpreter 202 interpreting 
the VoiceXML application. In events 2, 6, and 10 of Table 1, the simulation script 
provides a simulated output that simulates a text equivalent and an execution time for 
the nominal outputs in events 1, 5, and 9, respectively. And in events 4 and 8 of Table 
1, the simulation script provides a simulated input that simulates a text equivalent and 
an execution time for the pre-determined user inputs in events 3 and 7, respectively. 
The present invention thereby allows a developer of a voice application to test the 
application by simulating the real-time flow of events between a user and a voice 
application. The simulated inputs and outputs are executed in conjunction with the 
voice application in real-time to test the application. This greatly aids in developing the 
voice application. 

[0027] Another advantage of the present invention is that, in a simulation 
environment, the system should be vendor-agnostic, in that the system should not have 
to handle the various behaviors of different speech technology platforms. The 
environment should instead be focused on simulating the voice application itself. Thus, 
the system 200 dispenses with the need for the speech technologies and its attendant 
devices, as well as for any network and telephony sub-system. The present invention 
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therefore provides a robust simulation environment that does not depend on the various 
components generally used to implement and execute a voice application. 

[0028] The present invention can be realized in hardware, software, or a combination 
of hardware and software. An implementation of the method and system of the present 
invention can be realized in a centralized fashion in one computer system, or in a 
distributed fashion where different elements are spread across several interconnected 
computer systems. Any kind of computer system, or other apparatus adapted for 
carrying out the methods described herein, is suited to perform the functions described 
herein. 

[0029] A typical combination of hardware and software could be a general purpose 
computer system with a computer program that, when being loaded and executed, 
controls the computer system such that it carries out the methods described herein. 
The present invention can also be embedded in a computer program product, which 
comprises all the features enabling the implementation of the methods described 
herein, and which, when loaded in a computer system is able to carry out these 
methods. 

[0030] Computer program or application in the present context means any 
expression, in any language, code or notation, of a set of instructions intended to cause 
a system having an information processing capability to perform a particular function 
either directly or after either or both of the following a) conversion to another language, 
code or notation; b) reproduction in a different material form. Significantly, this invention 
can be embodied in other specific forms without departing from the spirit or essential 
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attributes thereof, and accordingly, reference should be had to the following claims, 
rather than to the foregoing specification, as indicating the scope of the invention. 
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